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Abstract 

We introduce an online version of the multiselection problem, in which q selection queries are 
requested on an unsorted array of n elements. We provide the first online algorithm that is 1- 
competitive with Kaligosi et al.[ICALP 2005] in terms of comparison complexity. Our algorithm 
also supports online search queries efficiently. 

We then extend our algorithm to the dynamic setting, while retaining online functionality, 
by supporting arbitrary insertions and deletions on the array. Assuming that the insertion of 
an element is immediately preceded by a search for that element, we show that our dynamic 
online algorithm performs an optimal number of comparisons, up to lower order terms and an 
additive 0{n) term. 

For the external memory model, we describe the first online multiselection algorithm that is 
0(l)-competitive. This result improves upon the work of Sibeyn [Journal of Algorithms 2006] 
when q > to, where m is the number of blocks that can be stored in main memory. We also 
extend it to support searches, insertions, and deletions of elements efficiently. 
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1 Introduction 



The multis election problem asks for the elements of rank Q = qi,q2, ■ ■ ■ ,Qq on an unsorted array A 
drawn from an ordered universe of elements. We define B(S q ) as the information-theoretic lower 
bound on the number of comparisons needed to answer q queries, where S q = s, denotes the queries 
ordered by rank. We define Aj = Sj+i — Sj, where sq = and s q+ \ = n. Then, 



i q 

n 



B{S q ) = log n\-J2 log (A t !) = A * lQ g T- " O(n)0 
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Several papers have analyzed this problem carefully. Dobkin and Munro [DM81] gave a de- 
terministic bound using 3B(S q ) + 0(n) comparisons. Prodinger [Pro95| proved the expected com- 
parisons with random pivoting is 2B(S q )ln2 + 0(n). Most recently, Kaligosi et al. [KMMS05 
showed a randomized algorithm taking B(S q ) + 0(n) expected comparisons, along with a deter- 
ministic algorithm taking B(S q ) + o(B(S q ) + 0(n) comparisons. Jimenez and Martinez [JM10] later 
improved the number of comparisons in the expected case to B(S q ) + n + o(n). Most recently, Car- 
dinal et al. |CFJ + 09 generalized the problem to a 'partial order production, of which multiselection 



is a special case. Cardinal et al. use the multiselection algorithm as a subroutine after an initial 
preprocessing phase. 

Kaligosi et al. [KMMS05] provide an elegant result in the deterministic case based on tying the 
number of comparisons required for merging two sorted sequences to the information content of 
those sequences. This simple observation drives an approach where manipulating these runs to both 
find pivots that are "good enough" and partition with near-optimal comparisons. The weakness of 
the approaches in internal memory is that they must know all of the queries a priori. 

In external memory, Sibeyn [Sib06j solves multiselection using n + nq/m l ~ e I/Os, where e is 
any positive constant. The first term comes form creating a static index structure using n I/Os, 
and the reminder comes from the q searches in that index. In addition, his results also require the 
condition that logiV = O(B). When q = m, Sibeyn's multiselection algorithm requires 0{nm e ) 
I/Os, whereas the optimum is 0(n) I/Os. In fact his bounds are u(B m (S q )), for any q > m, where 
B m (S q )) is the lower bound on the number of I/Os required (see Section fD.il for the definition). 



1.1 Our Results 

For the multiselection problem in internal memory, we describe the first online algorithm that 
supports a set Q of q selection, search, insert, and delete operations, of which q' are search, 
insert, and delete, using B(S q ) + o(B(S q ) + 0(n + q'logn) comparisons!! Thus our algorithm 
is 1-competitive with the offline algorithm of Kaligosi et al. [KMMS05] in terms of comparison 
complexity. We also show a randomized result achieving 1-competitive behavior with respect to 
Kaligosi et al. [KMMS05j, while only using 0((log(n))°^ 1 ^) sampled elements instead of 0(n 3//4 ). 

For the external memory model, we describe an online multiselection algorithm that supports a 
set Q of q selection queries on an unsorted array stored on disk in n blocks, using 0{B m (S q )) + 0{n) 
I/Os, where B m {S q ) is a lower bound on the number of I/Os required to support the given queries. 
This result improves upon the work of Sibeyn [Journal of Algorithms 2006] when q > m, where m 

1 We use the notation log b a to refer to the base b logarithm of a. By default, we let b = 2. We also define In b as 
the base e logarithm of b. 

2 For the dynamic result, we assume that the insertion of an element is immediately preceded by a search for that 
element. In that case, we show that our dynamic online algorithm performs an optimal number of comparisons, up 
to lower order terms and an additive 0(n) term. 
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is the number of blocks that can be stored in main memory. We also extend it to support insertions 
and deletions of elements using 0{B m {S q )) + 0(n + qlog B N) I/Os. 

1.2 Preliminaries 

Given an unsorted array A of length n, the median is the element x of A such that exactly |~n/2] 
elements in A are greater than or equal to x. It is well-known that the median can be computed 
in 0(n) time, and many [Hoa61| BFP + 73] ISPP76] have analyzed the exact constants involved. The 
best known result is due to Dor and Zwick }DZ99j to obtain 2.942 + o(n) time. 

In the external memory model, the computer is abstracted to consist of two memory levels: 
the internal memory of size M, and the (unbounded) disk memory, which operates by reading 
and writing data in blocks of size B. We refer to the number of items of the input by N. For 
convenience, we define n = N/B and m = M/B as the number of blocks of input and memory, 
respectively. We make the reasonable assumption that 1 < B < M/2. In this model, we assume 
that each I/O read or write is charged one unit of time, and that an internal memory operation 
is charged no units of time. To achieve the optimal sorting bound of SortlO(N) = @(nlog m n) 
in this setting, it is necessary to make the tall cache assumption [BF03]: M = £l(B 1+€ ), for some 
constant e > 0, and we will make this assumption for the remainder of the paper. 

2 A Simple Online Algorithm 

Let A be an input array of n unsorted items. We describe a simple version of our algorithm for 
handling selection and search queries on array A. We say that an element in array A at position % 
is a pivot if A[l . . . i — 1] < k[i] < k[i + 1 . . . n]. 

Bit Vector. Throughout all the algorithms in the paper, we maintain a bitvector B of length n 
where B[i] = 1 if and only if it is a pivot. 

Preprocessing. Create a bitvector B and set each bit to 0. Find the minimum and maximum 
elements in array A, swap them into A[l] and A[n] respectively, and set B[l] = B[n] := 1. 

Selection. We define the operation A.select(s) to refer to the selection query s, which returns 
A[s] if A were sorted. To compute this result, if B[s] = 1 then return A[s] and we are done. If 
B[s] = 0, find a < s, b > s, such that B[a] = B[b] = 1 but B[a + 1 . . . b - 1] are all 0. Perform 
quickselect [Hoa61] on A [a + 1 ... b — 1], marking pivots found along the way in B. This gives us 
A[s], with B[s] = 1, as desired. 

Search. We define the operation k.search{p) returns the position j, which satisfies p = A[j] if A 
were sorted; if p £ A, then j is the number of items in A smaller than Perform a binary search 
on A as if A were sorted. Let i be the location in A we find from the search; if along the way we 
discovered endpoints for the subarray we are searching that were out of order, stop the search and 
let i be the midpoint. If k[i] = p and B[i] = 1 return i and we are done. Otherwise, we have just 
identified the unsorted interval in A that contains p if it is present. Perform a selection query on 
this interval; choose which side of a pivot on which to recurse based on the value of p (instead of 
an array position as would be done in a normal selection query). As above, we mark pivots in B as 
we go; at the end of the recursion we will discover the needed value j. 

As queries arrive, our algorithm performs the same steps that quicksort would perform, although 
not necessarily in the same order. If we receive enough queries, we will, over time, perform a 
quicksort on array A. This also means that our recursive subproblems mimic those from quicksort. 

3 The search operation is essentially the same as rank on the set of elements stored in the array A. We call it 
search to avoid confusion with the rank operation defined on bitvectors in Section [S] 
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We have assumed, up to this point, that the last item in an interval is used as the pivot, and 
a simple linear-time partition algorithm is used. We explore using different pivot and partitioning 
strategies to obtain various complexity results for online selection and searching. As an easy 
consequence of more a more precise analysis to follow, we show that the time to perform q select 
and search queries on an array of n items is 0(n log q + qlogn). Now, we define terminology for 
this alternate analysis. 

2.1 Terminology 

For now we assume all queries are selection queries, since search queries are selection queries with 
a binary search preprocessing phase taking O(logn) comparisons. We explicitly bound the binary 
search cost in our remaining results. 

Query and Pivot Sets. Let Q denote a sequence of q selection queries, ordered by time of 
arrival. Let St = {si} denote the first t queries from Q, sorted by position. We also include so = 1 
and St_|_i = n in St for convenience of notation, since the minimum and maximum are found during 
preprocessing. Let Pt = {p{\ denote the set of k pivots found by the algorithm when processing 
St, again sorted by position. Note that p\ = 1, p& = n, B[pj] = 1 for all i, and St C P t . 

Pivot Tree, Recursion Depth, and Intervals. The pivots chosen by the algorithm form a 
binary tree structure, defined as the pivot tree T of the algorithm over timeo Pivot pi is the parent 
of pivot pj if, after pi was used to partition an interval, pj was the pivot used to partition either 
the right or left half of that interval. The root pivot is the pivot used to partition A[2..n — 1] due to 
preprocessing. The recursion depth, d{pi), of a pivot p, is the length of the path in the pivot tree 
from pi to the root pivot. All leaves in the pivot tree are also selection queries, but it may be the 
case that a query is not a leaf. Each pivot was used to partition an interval in A. Let I(pi) denote 
the interval partitioned by p, (which may be empty), and let |/(pi)| denote its length. Intervals 
form a binary tree induced by their pivots. If pj is an ancestor of pj then I(pj) C I(pi)- The 
recursion depth of an array element is the recursion depth of the smallest interval containing that 
element, which in turn is the recursion depth of its pivot. 

Gaps and Entropy. Define the query gap Af* := Sj+i — Sj and similarly the pivot gap A^* : = 
Pi+i ~ Pi- Observe that each pivot gap is contained in a smallest interval I(p). One endpoint of 
this gap is the pivot p of interval I(p), and the other matches one of the endpoints of interval I(p). 
By telescoping we have ^ A i ' = £V Aj"' = n — 1. 

We will analyze the complexity of our algorithms based on the number of element comparisons. 
The lower bound on the number of comparisons required to answer the selection queries in St is 
obtained by taking the number of comparisons to sort the entire array, and then subtracting the 
comparisons needed to sort the query gaps. We use B(St) to denote this lower bound. 

t 

B{S t ) := ^(Af)log(n/(Af))-0(n). 

i=0 

Note that B{S q ) < nlogq: this upper bound is met when the queries are evenly spaced over the 
input array A. We can show that the simple algorithm performs 0{B(S q ) + qlogn) for a sequence Q 

intuitively, a pivot tree corresponds to a recursion tree, since each node represents one recursive call made during 
the quickselect algorithm Hoa61 . 
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of q select and search queries on an array of n elements. We will also make use of the following fact 
in the paper. 

Fact 1. For all e > 0, there exists a constant c e such that for all x > 4, log log log x < e\ogx + c e . 

Proof. Since lim x _j. 00 (logloglogx)/(logx) = 0, there exists a k e such that for all x > k e , we 
know that (log log log x)/(log x) < e. Also, we know that in the interval [4, k e ], the continuous 
function log log log x — elogx is bounded. Let c t = log log log k e — 2e, which is a constant. □ 



3 A Lemma on Sorting Entropy 

Pivot Selection Methods. We say that a pivot selection method is good for the constant c with 
1/2 < c < 1 if, for all pairs of pivots pi and pj where pi is an ancestor of pj in the pivot tree, then 

\I<J>j)\ < \I{pi)\-c d ^- d ^ + °^. 

Note that if the median is always chosen as the pivot, we have c = 1/2 and the O(l) term is in fact 
zero. The pivot selection method of Kaligosi et al. [KMMS05, Lemma 8] is good with c = 15/16. 

Lemma 1. If the pivot selection method is good as defined above, then B(Pf) = B(St) + 0(n). 

Proof. We sketch the proof and defer the full details to Appendix [Bj Consider any two consecutive 
selection queries s and s', and let A = s' — s be the gap between them. Let Pa = (PhPi+i, ■ ■ ■ ,Pr) 
be the pivots in this gap, where pi = s and p r = s' . The lemma follows from the claim that 
B{P A ) = 0(A), since 

B(P t ) - B{S t ) = i n log n - ^ Af log Af j - log n - ^ Af log Af* j 



Af 1 log Af - ^ Af log Af 

i=0 j=0 

E^) = E°(a?)=oW- 



i=0 i=0 



We now sketch the proof of our claim, which proves the lemma. 

There must be a unique pivot p m in Pa of minimal recursion depth. We split the gap A at p m . 
We define For brevity, we define D l = YT=o A i and D r = Ya=L A i, giving A = Di + D r . 

We consider the proof on the right-hand side D r , and proof for Di is similar. Since we use a good 
pivot selection method, we can bound the total information content of the right-hand side by 0(D r ). 
This leads to the claim, and the proof follows. Details of this proof are in Appendix |Bl □ 

Theorem 1 (Online Multiselection). Given an array of n elements, on which we have performed 
a sequence Q of q online selection and search queries, of which q' are search, we provide 

• a randomized online algorithm that performs the queries using B(S q ) + 0(n + q' log n) expected 
number of comparisons, and 

• a deterministic online algorithm that performs the queries using at most 4B(S q ) + 0{n + 
q' log n) comparisons. 
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Proof. For the randomized algorithm, we use the randomized pivot selection algorithm of Kaligosi et 
al. |KMMS05t Section 3, Lemma 2].) This algorithm gives a good pivot selection method with 
c = 1/2 + o(l), and the time to choose the pivot is 0(A 3 / 4 ) on an interval of length A, which is 
subsumed in the 0(n) term in the running time. Each element in an interval participates in one 
comparison per partition operation. Thus, the total number of comparisons is expected to be the 
sum of the recursion depths of all elements in the array. This total is easily shown to be B(P q ), and 
by Lemma [H the proof is complete. In Appendix [Aj we describe how to get a good pivot selection 
method with just 6(log n) 3 (log A) 2 samples, instead of 0(A 3//4 ). 

For the deterministic algorithm, we use the median of each interval as the pivot; the median- 
finding algorithm of Dor and Zwick [DZ99 gives this to us in under 3A comparisons. We add 
another comparison for the partitioning, to give a count of comparisons per array element of four 
times the recursion depth. This is at most 4J3{P q ), which is no more than 4B(S q ) + 0(n) from 
Lemma [U and the result follows. □ 

4 Optimal Online Multiselection 

In this section we prove the following theorem. 

Theorem 2 (Optimal Online Multiselection). Given an unsorted array Aofn elements, we provide 
a deterministic algorithm that supports a sequence Q of q online selection and search queries, of 
which q' are search, using B(S q )(l + o(l)) + 0(n + g'logn) comparisons in the worst case. 

Note that our bounds match those of the offline algorithm of Kaligosi et al. [KMMS05J when 
q' = (i.e., there are no search queries). In other words, we provide the first 1-competitive 
online multiselection algorithm. We explain our proof with three main steps. We first explain our 
algorithm and how it is different from the algorithm in [KMMS05]. We then bound the number of 
comparisons from merging by B(S q )(l + o(l))+0(n), and then we bound the number of comparisons 
from pivot finding and partitioning by o(B(S q )) + 0(n). 

4.1 Algorithm Description and Modifications 

We briefly describe the deterministic algorithm from Kaligosi et al. [KMMS05]. They begin by 
creating runs, which are sorted sequences from A of length roughly £ = log(B/n). Then, they 
compute the median m of the median of these sequences and partition the runs based on m. After 
partitioning, they recurse on the two sets of runs, sending select queries to the appropriate side 
of the recursion. To maintain the invariant on run length on the recursions, they merge short 
like-sized runs optimally until all but £ of the runs are again of length between £ and 21. 

We make the following modifications to the deterministic algorithm of Kaligosi et al. [KMMS05 : 

• The queries are processed online, that is, one at a time, from Q without knowing which 
queries will follow. To do this, we maintain the bitvector B as described above. 

• We admit search queries in addition to selection queries; in the analysis we treat them as 
selection queries, paying 0(q' logn) comparisons to account for binary search. 

• Since we don't know all of Q at the start, we cannot know the value of B(S q ) in advance. 
Therefore, we cannot preset a value for £ as in Kaligosi et al. [KMMS05 . Instead, we set £ 
locally in an interval I(p) to 1 + [\g(d(p) + 1)J . Thus, £ starts at 1 at the root of the pivot 
tree T, and since we use only good pivots, d(p) = O(lgra). (Also, £ = log logn + 0(1) in the 
worst case.) We keep track of the recursion depth of pivots, from which it is easy to compute 
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the recursion depth of an interval. Also observe that I can increase by at most one when 
moving down one recursion level during a selection. 

• We use a second bitvector R to identify the endpoints of runs within each interval that has 
not yet been partitioned. 

The algorithm to perform a selection query is as follows: 

• As described earlier in this paper, we use bitvector B to identify the interval from which to 
begin processing. The minimum and maximum are found in preprocessing. 

• If the current interval has length less than 4£ 2 , we sort the interval to complete the query 
(setting all elements as pivots). The cost for this case is bounded by Lemma [SJ 

• As is done in [KMMS05], we compute the value of £ for the current interval, merge runs so 
that there is at most one of each length < £, and then use medians of those runs to compute 
a median-of-medians to use as a pivot. We then partition each run using binary search. 

We can borrow much of the analysis done in [KMMS05J. We cannot use their work wholesale, 
because we don't know B in advance. For this reason, we cannot define £ as they have, and their 
algorithm depends heavily on its use. To finish the proof of our theorem, we show how to modify 
their techniques to handle this complication. 

4.2 Merging 

Kaligosi et al. [KMMS05, Lemmas 5 — 10] count the comparisons resulting from merging. Lemmas 
5, 6, and 7 do not depend on the value of I and so we can use them in our analysis. Lemma 8 
shows that the median-of-medians built on runs is a good pivot selection method. Although the 
proof clearly uses the value of £, its validity does not depend on how large i is; only that there are 
at least A£ 2 items in the interval, which also holds for our algorithm. Lemmas 9 and 10 together 
will bound the number of comparisons by B(S q )(l + o(l)) + O(n) if we can prove Lemma [21 which 
bounds the information content of runs in intervals that are not yet partitioned. 

Lemma 2. Let a run r be a sorted sequence of elements from A in a gap Af', where \r\ is its 
length. Then, 

k 

£ E \r\lg\r\ = o(B(S t )) + 0(n). 

Proof. In a gap of size A, £ = 0(log<i) where d the recursion depth of the elements in the gap. 
This gives SreA l r l 1°& l r l — Alog(2Z) = O(Aloglogd), since each run has size at most 2£. Because 
we use a good pivot selection method, we know that the recursion depth of every element in 
the gap is 0(log(n/A)). Thus, £*L E reA p * l r l lo S l r l ^ Ei At log log log (ra/ A*). Recall that 
B(S t ) = B{P t ) + 0(n) = J2 { Ai log(n/Aj) + 0(n). Using Fact EQ the proof is complete. □ 

4.3 Pivot Finding and Partitioning 

Now we prove that the cost of computing medians and performing partition requires at most 
o{B{S q )) + 0(n) comparisons. The algorithm computes the median m of medians of each run at 
a node v in the pivot tree T. Then, it partitions each run based on m. We bound the number 
of comparisons at each node v with more than 4£ 2 elements in Lemmas [3] and HI We bound the 
comparison cost for all nodes with fewer elements in Lemma [5j 
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Terminology. Let d be the current depth of the pivot tree T (defined in Section 12. ip , and let 
the root of T have depth d = 0. In tree T, each node v is associated with some interval I(p v ) 
corresponding to some pivot p v . We define A„ = |/(p«)| as the number of elements at node v in T. 

Recall that £ = 1 + Ll°g(^ + 1)J- Let a run be a sorted sequence of elements from A. We 
define a short run as a run of length less than £. Let /3n be the number of comparisons required to 
compute the exact median for n elements, where f3 is a constant less than three [DZ99| . Let rf, be 
the number of short runs at node v, and let r l v be the number of long runs. 

Lemma 3. The number of comparisons required to find the median m of medians and partition all 
runs at m for any node v in the pivot tree T is at most j3{£ — 1) A-£\og£ + j3{A v /£) + (A v /£) log(2^) 
comparisons. 

Proof. We compute the cost (in comparisons) for computing the median of medians. For the 
r s v < £ — 1 short runs, we need at most f3(£ — 1) comparisons per node. For the r l v < A v /£ long 
runs, we need at most f3(A v /£). 

Now we compute the cost for partitioning each run based on m. We perform binary search on 
each run. For short runs, this requires at most 5Zi=i l°gi — £^og£ comparisons per node. For long 
runs, we need at most (A v /£) log(2£) comparisons per node. □ 

Since our value of £ changes at each level of the recursion tree, we will sum the above costs by 
level. The overall cost in comparisons at level d is at most 

2 d p£ + 2 d £ log £ + (n/£)p + (n ft) log(2£) . 

We can now prove the following lemma. 

Lemma 4. The number of comparisons required to find the median of medians and partition over 
all nodes v in the pivot tree T with at least A£ 2 elements is at most o(B(St)) + 0(n). 

Proof. For all levels of the pivot tree up to level £' < log(B(Pt)/n), the cost is at most 

log(S(P t )/n) 

Y, 2 d £((3 + log£) + (n/£)(P + log(2£)). 

d=l 

Since £ = |_log(<i+l)J +1, the first term of the summation is bounded by (B(Pt)/n) log log(B(Pt)/n) = 
o{B{Pt)). The second term is easily upper-bounded by 

nlog(B(P t )/n)(logloglog(B(P t )/n)/loglog(B(P t )/n)) = o(B(P t )). 

Using Lemma [H the above two bounds are o{B{Stj) + 0(n). 

For each level £' with log(B(Pt)/n) < £' < loglogn + O(l), we need to bound the remaining 
cost. It is easy to bound each node v's cost by o(A v ), but this is not sufficient — though we have 
shown that the total number of comparisons for merging is B(St) + 0(n), the number of elements 
in nodes with A„ > A£ 2 could be u(B(St)). 

We bound the overall cost as follows, using the result of Lemma [3l Since node v has A„ > 
4£ 2 elements, we can rewrite the bounds as 0(A„/^log(2^)). Recall that £ = logd + 0(1) = 
log(0(log(n/A^))) = log log(n/At,) + O(l), since we use a good pivot selection method. Summing 
over all nodes, we get {A v /£) \og{2£) < Y, v & v log(2^) = o{B{P t )) + 0{n), using Fact □ and 
recalling that B(Pt) = A v log(n/A„). Finally, using Lemma[TJ we arrive at the claimed bound 
for queries. □ 
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Now we show that the comparison cost for all nodes v where A^ < A£ 2 is at most o(B(St))+0(n). 

Lemma 5. For nodes v in the pivot tree T where < M 2 , the total cost in comparisons for all 
operations is at most o(B(S t )) + 0(n). 

Proof. We observe that nodes with no more than A£ 2 elements do not incur any cost in comparisons 
for median finding and partitioning, unless there is (at least) one associated query within the node. 
Hence, we focus on nodes with at least one query. 

Let z = (log log n) 2 log log log n + 0(1). We sort the elements of any node v with A^ < A£ 2 
elements using O(z) comparisons, since £ < log log n + O(l). We set each element as a pivot. The 
total comparison cost over all such nodes is no more than 0(tz), where t is the number of queries 
we have answered so far. If t < n/z, then the above cost is 0(n). 

Otherwise, t > n/z. Then, we know that B(Pt) > (n/z) \og(n/z), by Jensen's inequality. (In 
words, this represents the sort cost of n/z adjacent queries.) Thus, tz G o(B(Ptj). Using LemmaQ] 
we know that B(Pt) = B(St) + 0(n), thus proving the lemma. □ 

5 Optimal Online Dynamic Mult iselect ion 

In this section, we extend our results for the case of the static array by allowing insertions and 
deletions of elements in the array, while supporting the selection queries. Recall that we are 
originally given the unsorted list A. For supporting insert and delete efficiently, we maintain the 
newly inserted elements in a separate data structure, and mark the deleted elements in A. These 
insert and delete operations are occasionally merged to make the array up-to-date. Let A' denote 
the current array with length n' . We want to support the following two additional operations: 

• insert (a), which inserts a into A', and; 

• delete(i), which deletes the ith (sorted) entry from A'. 

5.1 Preliminaries 

Our solution uses the dynamic bitvector data structure of Hon et al. [HSS03]. This structure 
supports the following set of operations on a dynamic bitvector V. The rank^i) operation tells the 
number of b bits up to the ith. position in V. The selectb(i) operation gives the position in V of the 
ith. b bit. The insert^i) operation inserts the bit b in the ith position. The delete(i) operation 
deletes the bit located in the ith position. The flip(i) operation flips the bit in the ith position. 

Note that one can determine the ith bit of V by computing rank\(i) — rank\(i — 1). (For 
convenience, we assume that rank^—l) = 0.) The result of Hon et al. [HSS03, Theorem 1] can 
be re-stated as follows, for the case of maintaining a dynamic bit vector (the result of [HSS03| is 
stated for a more general case). 

Lemma 6 ([HSS03J. Given a bitvector V of length n, there exists a data structure that takes 
n + o(n) bits and supports rankj, and selectb in 0(log t n) time, and insert, delete and flip in 0(t) 
time, for any parameter t such that (logn) ^ 1 ) < t < n. The data structure assumes access to a 
precomputed table of size n e , for any fixed e > 0. 

The elements in the array A swapped during the queries and insert and delete operations, to 
create new pivots, and the positions of these pivots are maintained as before using the bitvector B. 
In addition, we also maintain two bitvectors, each of length n': (i) an insert bitvector I such that 
l[i] = 1 if and only if A'[i] is newly inserted, and (ii) a delete bitvector D such that if D[i] = 1, the 
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ith element in A has been deleted. If a newly inserted item is deleted, it is removed from I directly. 
Both I and D are implemented as instances of the data structure of Lemma [6j 

We maintain the values of the newly inserted elements in a balanced binary search tree T. The 
inorder traversal of the nodes of T corresponds to the increasing order of their positions in array 
A'. We support the following operations on this tree are: (i) given an index i, return the element 
corresponding to the ith node in the inorder traversal of T, and (ii) insert /delete an element at a 
given inorder position. By maintaining the subtree sizes of the nodes in T, these operations can be 
performed in O(logn) time without having to perform any comparisons between the elements. 

Our preprocessing steps are the same as in the static case. In addition, the bitvectors I and D 
are each initialized to the bitvector of n Os. The tree T is initially empty. 

In addition, after performing |A| insert and delete operations, we merge all the elements in T 
with the array A, modify the bitvector B appropriately, and reset the bitvectors I and D (with all 
zeroes). This increases the amortized cost of the insert and delete operations by 0(1), without 
requiring any additional comparisons. 

5.2 Dynamic Online Multiselection 

We now describe how to support A' .insert (a), A' .delete(i), A' .select (i), and k! .search{a) operations. 

A' .insert (a) . First, we search for the appropriate unsorted interval [£, r] containing a using a binary 
search on the original (unsorted) array A. Now perform k.search{a) on interval [£, r] (choosing which 
subinterval to expand based on the insertion key a) until a's exact position j in A is determined. 
The original array A must have chosen as pivots the elements immediately to its left and right 
(positions j — 1 and j in array A); hence, one never needs to consider newly-inserted pivots when 
choosing subintervals. Insert a in sorted order in T among at position I.selecti(j) among all the 
newly-inserted elements. Calculate j' = I.selecto(j), and set a's position to j" = j' — B.ranki(j'). 
Finally, we update our bitvectors by performing I.inserti(j") and D.inserto(j"). Note that, apart 
from the search operation, all other operations in the insertion procedure do not perform any 
comparisons between the elements. 

k' .delete{i). First compute i! = D.selecto(i). If i' is newly-inserted (i.e., I[i'] = 1), then remove the 
node (element) with inorder number I.rank\{i') from T. Then perform I.delete{i') and D.delete(i'). 
If instead i! is an older entry, simply perform D.flip(i'). In other words, we mark the position i' in 
A as deleted even though the corresponding element may not be in its proper placeH 

k' .select (i). If l[i] = 1, return the element corresponding to the node with inorder number 
I.ranki(i) in T. Otherwise, compute i' = I.ranko(i) — D.ranki(i), and return k. s elect {i 1 )). 

k! .search{a). First, we search for the appropriate unsorted interval [£, r] containing a using a 
binary search on the original (unsorted) array A. Then, perform k.search(a) on interval [£,r] until 
a's exact position j is found. If a appears in array A (which we discover through search), we need 
to now check whether it has been deleted. We compute f = I.selecto(j) and j" = j' — D.ranki(j'). 
If D[j'] = 0, return j". Otherwise, it is possible that the item has been newly-inserted. Compute 
p = I.ranki(j'), which is the number of newly-inserted elements that are less than or equal to a. 
If T[p] = a, then return j"; otherwise, return failure. 

5 If a user wants to delete an item with value a, one could simply search for it first to discover its rank, and then 
delete it using this function. 
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We show that the above algorithm achieves the following performance (in Appendix [C]). 

Theorem 3 (Optimal Online Dynamic Multiselection). Given a dynamic array A' of n original 
elements, there exists a dynamic online data structure that can support q = 0{n) select, search, 
insert, and delete operations, of which q' are search, insert, and delete, we provide a deterministic 
online algorithm that uses at most B(S q )(l + o(l)) + 0(n + q'logn) comparisons. 

6 External Online Multiselection 

Suppose we are given an unsorted array A of length N stored in n = N/B blocks in the external 
memory. Recall that sorting A in the external memory model requires SortlO(N) = 0(nlog m n) 
I/Os. The techniques we use in main memory are not immediately applicable to the external mem- 
ory model. In the extreme case where we have q = N queries, the internal memory solution would 
require 0{n\og 2 {n/m)) I/Os. This compares poorly to the optimal 0(nlog m n) I/Os performed by 
the optimal mergesort algorithm for external memory. 

As in the case of internal memory, the lower bound on the number of I/Os required to perform 
a given set of selection queries can be obtained by subtracting the number of I/Os required to sort 
the elements between the 'query gaps' from the sorting bound. More specifically, let St = {si} be 
the first t queries from a query set Q, sorted by position, and for 1 < i < t, let A i t := Sj + i — Si be 
the query gaps, as defined in Section [2.11 Then the lower bound on the number of I/Os required 
to support the queries in St is given by 

t 

B m (S t ) := nlog m n-]T(Af7l?)log m (Af7B)-0(n), 

i=0 

where we assume that log m (^Af* /B^j = when A^* < mB = M in the above definition. 
6.1 Algorithm Achieving 0(B m {S q )) + 0(n) I/Os 

We now show that our lower bound is asymptotically tight, by describing an 0(l)-competitive 
algorithm. We assume that log N = logn + log-B = 0(B) — which allows us to store a pointer to a 
block of the input using a constant number of blocks. This constraint is a reasonable assumption in 
practice, and is similar to the word-size assumption transdichotomous word RAM model [FW93j. 
In addition, the algorithm of Sibeyn [Sib06] only works under this assumption, though this is not 
explicitly mentioned. We obtain the following result for the external memory model. 

Theorem 4. Given an unsorted array A occupying n blocks in external memory, we provide a 
deterministic algorithm that supports a sequence Q of q online selection queries using 0(B m (S q )) + 
0(n) I/Os under the condition that log N = 0(B). 

Proof. Our algorithm uses the same approach as the internal memory algorithm, except that it 
chooses d — 1 pivots at once using Lemma [9l Hence, each node v of the pivot tree T containing A„ 
elements has a branching factor of d. It subdivides its A^ elements into d partitions. Using 
Lemma [TOl we know this requires 25 v + d I/Os, where 5 V = A v /B. 

We choose d = m/2, which satisfies the constraints for Lemmas [9]— 1101 We also maintain the 
bitvector V of length N, as described before. For each k.select(i) query, we access position V[i]. 
If V[i] = 1, return A[i], else scan left and right from the ith position to find the endpoints of this 
interval Ij using \It\/B I/Os. The analysis follows directly from the internal algorithm. □ 

We extend this result to also support search, insert, and delete operations in Appendix iDl 



11 



References 

[BF03] G. Brodal and R. Fagerberg. On the limits of cache-obliviousness. In Proceedings of 
the ACM Symposium on Theory of Computing, pages -, 2003. 

[BFP+73] Manuel Blum, Robert W. Floyd, Vaughan R. Pratt, Ronald L. Rivest, and Robert En- 
dre Tarjan. Time bounds for selection. J. Comput. Syst. Sci., 7(4):448-461, 1973. 

[CFJ+09] Jean Cardinal, Samuel Fiorini, Gwenael Joret, Raphael M. Jungers, and J. Ian Munro. 

An efficient algorithm for partial order production. In Proceedings of the J^lst annual 
ACM symposium on Theory of computing, STOC '09, pages 93-100, New York, NY, 
USA, 2009. ACM. 

[DM81] David P. Dobkin and J. Ian Munro. Optimal time minimal space selection algorithms. 
J. ACM, 28(3):454-461, 1981. 

[DZ99] Dorit Dor and Uri Zwick. Selecting the median. SI AM J. Comput, 28(5): 1722-1758, 
1999. 

[FW93] Michael L. Fredman and Dan E. Willard. Surpassing the information theoretic bound 
with fusion trees. J. Comput. Syst. Sci., 47(3):424-436, 1993. 

[Hoa61] C. A. R. Hoare. Algorithm 65: find. Commun. ACM, 4(7):321-322, 1961. 

[HSS03] Wing-Kai Hon, Kunihiko Sadakane, and Wing-Kin Sung. Succinct data structures for 
searchable partial sums. In Proceedings of the International Symposium on Algorithms 
and Computation, pages 505-516, 2003. 

[JM10] Rosa M. Jimenez and Conrado Martinez. Interval sorting. In Proceedings of the In- 
ternational Colloquium on Automata, Languages, and Programming, pages 238-249, 
2010. 

[KMMS05] Kanela Kaligosi, Kurt Mehlhorn, J. Ian Munro, and Peter Sanders. Towards optimal 
multiple selection. In ICALP, pages 103-114, 2005. 

[MP80] J. Ian Munro and Mike Paterson. Selection and sorting with limited storage. Theor. 
Comput. Sci., 12:315-323, 1980. 

[Pro95] Helmut Prodinger. Multiple quickselect - hoare's find algorithm for several elements. 
Inf. Process. Lett, 56(3): 123-129, 1995. 

[Sib06] Jop F. Sibeyn. External selection. J. Algorithms, 58(2): 104-1 17, 2006. 

[SPP76] Arnold Schonhage, Mike Paterson, and Nicholas Pippenger. Finding the median. J. 
Comput. Syst. Sci., 13(2): 184-199, 1976. 



12 



A Randomized Algorithm 



Our pivot-choosing method is simple and randomized. We choose 2m elements at random from an 
interval of size A, sort them (or use a median-finding algorithm) to find the median, and use that 
for our pivot. We wish to set values of m and t such that two events happen: 

• At least 2t elements are chosen in an interval of size 2A/logA about the median of the 
interval. 

• Between m — t and m + t elements are chosen less than the median. 

• Between m — t and m + t elements are chosen larger than the median. 

If we can show that all events happen with probability 1 — 0(l/n 2 ), then we end up with the 
median of our 2m elements being a pivot at position 1/2(1 + 0(1/ log A)), which is a good pivot. 

Note that the last two events are mirror images of one another, and so have the same probability 
of occurring. 

First Event. This is the simpler of the two to estimate. A randomly chosen element fails 
to land in the middle interval with probability 1 — 2/ log A = exp[— 2/ log A(l + o(l))]. If we 
choose at least (1.1) log A log n elements, all fail to land in this middle interval with probability 
(1 - 2/ log A)( L1 ) lo g A1 °s n = exp[-(2.2)logn(l + o(l))] = 0{l/n 2 ). Since we need 2t elements in 
the interval, it suffices for 2m > (2.2)t log A log n, orm> (l.l)tlog Alogn. 

Second (and third) Event. We need a bound on the sum of the first k binomial coefficients. 

The following bound and proof are attributed to Lovasz: 

Lemma 7. Let < k < m and define c := ( fc 2 + 1 )/( 2 ™) • Then 




Proof. Write k + 1 = m — t. Define 




By the definition of c we have 

2m \ (2m 



c 

m — t) \m 

and, because the growth rate of one binomial coefficient to the next slows as we approach ( 2 ? ^ 1 ) , we 
have 

2m \ ( 2m 

< c 

K m — t — 1/ \m — 1, 

and thus 

(2m \ ( 2m 
m 

for < j < m — t. 



, < c, 

m — t — j J \m — j 
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Thus it follows that the sum of any t consecutive binomial coefficients is less than c times the 
sum of the next t coefficients as long as we stay on the left-hand side of Pascal's triangle. Thus 
A < cB + c 2 B + c 3 B H < jz^B. We also have A + B < 2 2m_1 . Combining these we have 

A < ^—B < — (2 2m - 1 - A) . 
1-c - c-1 v ; 

Solving for A completes the proof. □ 



We then bound 



/ 2m \ 

\m-t) < e -t 2 /(m+t) 



/2m\ 
\ m ) 

This can be derived from Stirling's formula and Taylor series estimates for the exponential and 
logarithm functions. We then obtain that 

Lemma 8. Let < t < m. Then 

i=0 ^ 1 ' 

Since choosing an element from an interval at random and observing if it falls before or after 
the median is an event of probability 1/2, the event of choosing 2m elements and having less than 
m — t fall below the median occurs with probability at most 

m— t— 1 



-2m 



y-v j'll ii 



By our lemma above, this is bounded by (1/2) exp[— t 2 /(m + t)]. Thus, the probability there are 
between m — t and m+t elements below the median is at least 1 — exp[— t 2 /(m+t)] by the symmetry 
of Pascal's triangle. To obtain 1 — 0(l/n 2 ) we need t 2 /(m + t) > 21ogn, or t > \J2m log n(l + o(l). 

Using our lower bound for m in terms of t above, we conclude that m = 6(log n) 3 (log A) 2 and 
t = 4(log n) 2 log A meet our needs. 

Theorem 5. Given a list of elements of length A < n, with A at least 6(log n) 3 (log A) 2 , with 
probability at least 1 — 0(l/n 2 ), if we sample 6(log n) 3 (log A) 2 of the A elements uniformly at 
random, then median of the sample falls in position A/2 ± A/ log A in the original list. 

B Proof of Lemma [I] (Entropy Lemma) 

Proof. Consider any two consecutive selection queries s and s', and let A = s' — s be the gap 
between them. Let Pa = (PhPi+i, ■ ■ ■ ,Pr) be the pivots in this gap, where pi = s and p r = s' . The 
lemma follows from the claim that B(P&) = 0(A), since 



B{P t ) - B{S t ) = in log n - A f lo g A f - nlogn - ^ Af* log A 
\ j=o J \ i=o 

t k 

= ^AfMogAf -£)Af logAf 

i=0 j=0 

= ^B(P Aft ) = ^0(Af')=0(n). 

8=0 ' 8=0 



St 
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We now proceed to prove our claim. 

There must be a unique pivot in Pa of minimal recursion depth. Any pair of pivots with 
the same recursion depth must have a common ancestor, and this ancestor must lie between the 
pair. This ancestor is in and it has smaller recursion depth than the pair. Let p m denote the 
pivot of minimum depth. (Note that p m = s or p m = s' are possible.) As before, define the gaps 
Aj = pi + i — pi for I < i < r. We split the gap A at p m . We address the right side first, and the 
argument for the left side is similar. 

The sequence d(p m ), d(p m+ i), . . . , d(p r -i) must be strictly increasing. Otherwise, one of these 
pivots must be a leaf in the pivot tree, and hence a query, which is a contradiction. 

Now consider I(p m+ i). This interval must have p m as its left endpoint, due to its smaller 
recursion depth. Its right endpoint must have recursion depth shallower than p m +i, an d hence 
it contains all pivots up to and including p r . This means I{pi) C I(p m +i) for m + 1 < i < r, 
Aj = p i+ i - pi < \I(pi)\ for m + 1 < i < r, and it means that A < \I(p m _i)\ + |I(p m+1 )|. 

For brevity, we define Di = Y^H=q 1 ^* anc ^ D r = X)i=m ^i, giving A = D\ + D r , and further 
D i < |/(p m _i)|, D r < |J(p m+ i)|. Let us also define a, := D r /Ai for m < i < r. We have 

r—1 r—1 r—1 

D r log D r - ^ Aj log Aj = A i M Dr/Ai) = D r (log «j)/aj. 

i—m i=m i=m 

This quantity can be bounded from above with a lower bound on aj. Write D r = b ■ \I(p m+ \)\ for 
a constant b with < b < 1. So we have 

Oi = D r /A t > D r /\I( Pi )\ = b\I(p m+1 )\/\I( Pi )\. 

Since we are using a good pivot selection method, we get the bound 

\I(Pi)\<\I(p m +i)\-c d ^- d(Pm ^ + °W. 

Plugging in gives us «j > b ■ c -d(pi)+d(p m+ i)+0(i) > ^ . c m+i-i+0(i)_ T j ie ^ agt inequality used the 
fact that the recursion depths must be strictly increasing. Then 



r—1 . r—l—m 



\ - log at < \ - log(&c- , _ 

ni - hri+OlD 1 >' 

i= 

And thus 



J+0(1)> 



on — bci+OW 

3=0 



r-1 



D r log D r - ^ Aj log A, = 0{D r ). 

i=m 

A similar argument on the left side gives 

m—l 

Di log D t - AjlogA, = 0(A)- 

i=0 

Finally, A log A — D r log D r — D\ log D[ = O(A), and the proof is complete. □ 



C Proof of Theorem [3] 

Let A' denote the current array of length n' , after a sequence of queries and insertions. Let Q be the 
sequence of q selection operations performed (either directly or indirectly through other operations) 
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on A', ordered by time of arrival. Let S q be the queries of Q, ordered by position. We now analyze 
the number of comparisons performed by a sequence of queries and insert and delete operations. 

We consider the case when the number of insert and delete operations is less than n. In other 
words, we are between two rebuildings of our dynamic data structure. If q' is the number of 
search, insert, and delete operations in the sequence, then we perform 0{q' logn') comparisons 
to perform the required searches. Note that our algorithm does not perform any comparisons for 
delete(i) operations, until some other query is in the same interval as i. The deleted element will 
participate in the other costs (merging, pivot-finding, and partitioning) for these other queries, but 
its contribution can be bounded by O(logn), which we have as a credit. 

Since a delete operation does not perform any additional comparisons beyond those needed to 
perform a search, we assume that all the updates are insertions in the rest of this section. Since 
each inserted element becomes a pivot immediately, it does not contribute to the comparison cost 
of any other select operation. Also, note that in the algorithm of Theorem [2j no pivot is part of a 
run and hence cannot effect the choice of any future pivot. 

Since Q is essentially a set of q selection queries, we can bound its total comparison cost for 
selection queries by Theorem [2l which gives a bound of B(S q )(l + o(l)) + 0(n). This proves the 
theorem. 

D External Online Mult iselect ion 

Suppose we are given an unsorted array A of length N stored in n = N/B blocks in the external 
memory. Recall that sorting A in the external memory model requires SortlO(N) = @(nlog m n) 
I/Os. The techniques we use in main memory are not immediately applicable to the external mem- 
ory model. In the extreme case where we have q = N queries, the internal memory solution would 
require 0(nlog 2 (n/m)) I/Os. This compares poorly to the optimal 0(nlog m n) I/Os performed by 
the optimal mergesort algorithm for external memory. 

D.l A Lower Bound for Multiselect in External Memory 

As in the case of internal memory, the lower bound on the number of I/Os required to perform a 
given set of selection queries can be obtained by subtracting the number of I/Os required to sort 
the elements between the 'query gaps' from the sorting bound. More specifically, let St = {si} be 
the first t queries from a query set Q, sorted by position, and for 1 < i < t, let A i * := Sj+i — Sj be 
the query gaps, as defined in Section [2.1i Then the lower bound on the number of I/Os required 
to support the queries in St is given by 

t 

B m (S t ) := nlog ro n-j;(Af t /B)log m (AfVB)-0(n), 

where we assume that log m (a^/Bj = when Af* < mB = M in the above definition. 
D.2 Partitioning in External Memory 

The main difference between our algorithms for internal and external memory is the partitioning 
procedure. In the internal memory algorithm, we partition the values according to a single pivot, 
recursing on the half that contains the answer, oln the external memory algorithm, we modify 
this binary partition to a e?-way partition, for some d = Q(m), by finding a sample of d "roughly 
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equidistant elements." The next two lemmas describe how to find such a sample, and then partition 
the range of values into d + 1 subranges with respect to the sample. 

Lemma 9. Given an unsorted array A containing N elements in external memory and an integer 
parameter d < M/logN, one can compute a sample of size d from A using n = N/B I/Os, such 
that the rank of the jth value in this sample is within [j(N/d) — d, (j + log(iV/<i) — l)N/d]. 

Proof. Given an unordered sequence of N elements (stored in a read-only memory) and an addi- 
tional working space of size S = fi((log N) 2 ), Munro and Paterson [MP80J showed how to compute 
a "reasonably well-spaced sample" of size s < S/ log N, in a single sequential scan over the se- 
quence. This well-spaced sample has the property that the rank of the jth element of the sample 
among the initial sequence of elements is between (jN/s) — 1 and (j + log(iV/s) — l)N/s. 

The algorithm only reads the input sequence from left to right, and it does not perform any 
random accesses to the sequence. Note that we have access to an unbounded working space in the 
external memory at the cost of additional I/Os. If M = o((logA r ) 2 ), we can use the space on the 
disk as temporary working space. Hence, it is easy to see that this algorithm can be translated to 
the external memory model with S = max{M, (log N) 2 }, which gives the result stated. □ 

Lemma 10. Given an unsorted array A occupying n pages of external memory and d < M/(2B) 
sample elements stored in main memory, there is an algorithm to partition A by those values in 
2n + d I/Os. 

Proof. The algorithm scans the data, keeping one input block and d + 1 output blocks in main 
memory. An output block is written to external memory when it is full, or when the scan is 
complete. The algorithm performs n I/O to read the input, and at most n + d + 1 I/Os to write 
the output into d + 1 partitions. □ 

D.3 Algorithm Achieving 0(B m (S q )) + 0(n) I/Os 

We now show that our lower bound is asymptotically tight, by describing an 0(l)-competitive 
algorithm. We assume that log N = logn + log-B = 0(B) — which allows us to store a pointer to a 
block of the input using a constant number of blocks. This constraint is a reasonable assumption in 
practice, and is similar to the word-size assumption transdichotomous word RAM model [FW93J. 
In addition, the algorithm of Sibeyn [Sib06] only works under this assumption, though this is not 
explicitly mentioned. 

Theorem 6. Given an unsorted array A occupying n blocks in external memory, we provide a 
deterministic algorithm that supports a sequence Q of q online selection queries using 0(B m (S q )) + 
0(n) I/Os under the condition that log N = 0(B). 

Proof. Our algorithm uses the same approach as the internal memory algorithm, except that it 
chooses d— 1 pivots at once using Lemma [SJ Hence, each node v of the pivot tree T containing A„ 
elements has a branching factor of d. It subdivides its A v elements into d partitions. Using 
Lemma [T0"| we know this requires 25 v + d I/Os, where 5 V = A v /B. 

We choose d = m/2, which satisfies the constraints for Lemmas 191—1101 We also maintain the 
bitvector V of length N, as described before. For each k.select(i) query, we access position V[i]. 
If V[i] = 1, return A[i], else scan left and right from the ith position to find the endpoints of this 
interval Ij using \Ii\/B I/Os. The analysis follows directly from the internal algorithm. □ 

To add searches, we cannot afford to spend logn time performing binary search on the blocks 
of B. To handle this case, we build a B-tree T maintaining all pivots from A. (During preprocessing, 
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we insert A[l] and A[n] into T.) The B-tree T will be used to support search queries in 0(\og B N) 
I/Os instead of 0(log N) I/Os. We modify the proof of Theorem [6] to obtain the following: 

Corollary 1. Given an unsorted array A occupying n blocks in external memory, we provide a 
deterministic algorithm that supports a sequence Q of q online selection and search queries using 
0(B m (S q )) + 0(min{<3TO, N} \og B N) + 0(n) I/Os under the condition that log iV = 0(B). 

Combining the ideas from Corollary Q] and Theorem [3j we can dynamize the above algorithm. 

Corollary 2. Given an unsorted array A occupying n blocks in external memory, we provide a deter- 
ministic algorithm that supports a sequence Q of q online select, search, insert, and delete operations 
using 0(B m (S q )) + 0(min{qm, N}log B N) + 0(n) I/Os under the condition that logiV = 0(B). 
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